Skip to content

Always process ChannelMonitorUpdates asynchronously #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Dec 3, 2022

Based on #1887, #1863, and #1886. Will require #1507, I think as well.

We currently have two codepaths on most channel update functions -
most methods return a set of messages to send a peer iff the
ChannelMonitorUpdate succeeds, but if it does not we push the
messages back into the Channel and then pull them back out when
the ChannelMonitorUpdate completes and send them then. This adds
a substantial amount of complexity in very critical codepaths.

Instead, here we swap all our channel update codepaths to
immediately set the channel-update-required flag and only return a
ChannelMonitorUpdate to the ChannelManager. Internally in the
Channel we store a queue of ChannelMonitorUpdates, which will
become critical in future work to surface pending
ChannelMonitorUpdates to users at startup so they can complete.

This also lays the groundwork for supporting fully async ChannelMonitor updates by tracking in-flight updates so that we can re-run them on startup if the ChannelManager was persisted but the monitor update did not finish.

@TheBlueMatt TheBlueMatt added this to the 0.0.114 milestone Dec 3, 2022
@TheBlueMatt TheBlueMatt changed the title 2022 11 monitor updates always async Always process ChannelMonitorUpdates asynchronously Dec 3, 2022
@TheBlueMatt
Copy link
Collaborator Author

All that to remove 100 LoC in our messaging pipelines :)

$ git diff-tree --stat origin/upstream/main HEAD lightning/src/ln/channel{manager,}.rs
 lightning/src/ln/channel.rs        |  495 ++++++-------
 lightning/src/ln/channelmanager.rs | 1342 ++++++++++++++++--------------------
 2 files changed, 862 insertions(+), 975 deletions(-)

@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2022

Codecov Report

Base: 87.41% // Head: 87.59% // Increases project coverage by +0.17% 🎉

Coverage data is based on head (d8619ba) compared to base (62a88f9).
Patch coverage: 89.23% of modified lines in pull request are covered.

❗ Current head d8619ba differs from pull request most recent head 2adb8ee. Consider uploading reports for the commit 2adb8ee to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1897      +/-   ##
==========================================
+ Coverage   87.41%   87.59%   +0.17%     
==========================================
  Files         101      101              
  Lines       45633    47683    +2050     
  Branches    45633    47683    +2050     
==========================================
+ Hits        39891    41768    +1877     
- Misses       5742     5915     +173     
Impacted Files Coverage Δ
lightning/src/util/ser.rs 84.72% <ø> (+1.28%) ⬆️
lightning/src/ln/channel.rs 83.99% <85.39%> (+0.36%) ⬆️
lightning/src/ln/channelmanager.rs 89.40% <89.49%> (+1.74%) ⬆️
lightning/src/ln/chanmon_update_fail_tests.rs 97.22% <93.33%> (-0.10%) ⬇️
lightning/src/chain/chainmonitor.rs 93.00% <100.00%> (-0.16%) ⬇️
lightning/src/ln/functional_tests.rs 97.33% <100.00%> (+0.04%) ⬆️
lightning/src/ln/payment_tests.rs 95.86% <100.00%> (ø)
lightning/src/util/time.rs 38.29% <0.00%> (-1.71%) ⬇️
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from ab58f1c to 4b348e6 Compare December 3, 2022 19:13
@TheBlueMatt TheBlueMatt added the blocked on next release Should Wait Until Next Release To Land label Dec 5, 2022
@TheBlueMatt
Copy link
Collaborator Author

Should revisit #1886 (comment) as a part of this PR.

@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from 4b348e6 to ace0660 Compare December 19, 2022 20:56
@TheBlueMatt TheBlueMatt removed the blocked on next release Should Wait Until Next Release To Land label Dec 21, 2022
@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch 2 times, most recently from e0bb80d to 809dabc Compare January 12, 2023 19:32
@TheBlueMatt
Copy link
Collaborator Author

Rebased, Note that the "Always process...async" commits will all need to be squashed down at the end as there are test failures introduced in between them which are fixed by completing the series. They're left separate for now for reviewability.

Comment on lines +3206 to +3217
self.pending_monitor_updates.push(monitor_update);
return Ok(self.pending_monitor_updates.last().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we don't call monitor_updating_paused here unlike most other places. What determines if it should be called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is in the MonitorUpdateInProgress-already-set case. monitor_update_paused primarily just sets some flags and then also MonitorUpdateInProgress. This if block just manually sets those flags instead. We could use the util but there's not much reason to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. I'm not a big fan of the util, to be honest. Seems the code would be much readable without it, given the amount of unnecessary parameters in some cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, definitely agree, guess I just kept using it cause it was already there and does "the thing". I'll remove it in a followup if that's okay.

We haven't had a `MonitorUpdateInProgress` check in
`Channel::is_live` for some time.
In the coming commits we'll move to async `ChannelMonitorUpdate`
application, which means we'll want to generate a
`ChannelMonitorUpdate` (including a new counterparty commitment
transaction) before we actually send it to our counterparty. To do
that today we'd have to actually sign the commitment transaction
by calling the signer, then drop it, apply the
`ChannelMonitorUpdate`, then re-sign the commitment transaction to
send it to our peer.

In this commit we instead split `send_commitment_no_status_check`
and `send_commitment_no_state_update` into `build_` and `send_`
variants, allowing us to generate new counterparty commitment
transactions without actually signing, then build them for sending,
with signatures, later.
When a `ChannelMonitor` update completes, we may need to take some
further action, such as exposing an `Event` to the user initiating
another `ChannelMonitor` update, etc. This commit adds the basic
structure to track such actions and serialize them as required.

Note that while this does introduce a new map which is written as
an even value which users cannot opt out of, the map is only filled
in when users use the asynchronous `ChannelMonitor` updates. As
these are still considered beta, breaking downgrades for such users
is considered acceptable here.
In order to support fully async `ChannelMonitor` updating, we need
to ensure that we can replay `ChannelMonitorUpdate`s if we shut
down after persisting a `ChannelManager` but without completing a
`ChannelMonitorUpdate` persistence. In order to support that we
(obviously) have to store the `ChannelMonitorUpdate`s in the
`ChannelManager`, which we do here inside the `Channel`.

We do so now because in the coming commits we will start using the
async persistence flow for all updates, and while we won't yet
support fully async monitor updating it's nice to get some of the
foundational structures in place now.
Over the next few commits, this macro will replace the
`handle_monitor_update_res` macro. It takes a different approach -
instead of receiving the message(s) that need to be re-sent after
the monitor update completes and pushing them back into the
channel, we'll not get the messages from the channel at all until
we're ready for them.

This will unify our message sending into only actually fetching +
sending messages in the common monitor-update-completed code,
rather than both there *and* in the functions that call `Channel`
when new messages are originated.
@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from 8ba64e2 to 46d0711 Compare February 17, 2023 19:43
@TheBlueMatt
Copy link
Collaborator Author

Rebased and added one fixup.


/// Only fails in case of signer rejection. Used for channel_reestablish commitment_signed
/// generation when we shouldn't change HTLC/channel state.
fn send_commitment_no_state_update<L: Deref>(&self, logger: &L) -> Result<(msgs::CommitmentSigned, (Txid, Vec<(HTLCOutputInCommitment, Option<&HTLCSource>)>)), ChannelError> where L::Target: Logger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: doesn't seem like we use the returned HTLCs anywhere so we could remove them from the return type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in a followup? That's an existing thing, no? (plus you're about to rewrite all that code :p)

let (_, mut additional_update) = self.send_commitment_no_status_check(logger).map_err(|e| (None, e))?;
// send_commitment_no_status_check may bump latest_monitor_id but we want them to be
let mut additional_update = self.build_commitment_no_status_check(logger);
// build_commitment_no_status_check may bump latest_monitor_id but we want them to be
// strictly increasing by one, so decrement it here.
self.latest_monitor_update_id = monitor_update.update_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this track the pending queued updates as well, such that we can ensure we only resume the channel after all pending updates complete, rather than having to flap between monitor_paused and monitor_restored after each successful update?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure why this comment is on this line, but in general we currently don't per the MonitorEvent::Completed::monitor_update_id docs (which ChainMonitor does implement properly). That's mostly because we previously didn't do any pending monitor update tracking in ChannelManager, but we're going to once we persist the new monitor update queue, so we should probably change this, but it doesn't have to be now.

if let Some(monitor_update) = monitor_update_opt.take() {
let update_id = monitor_update.update_id;
let update_res = self.chain_monitor.update_channel(funding_txo_opt.unwrap(), monitor_update);
break handle_new_monitor_update!(self, update_res, update_id, peer_state_lock, peer_state, chan_entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we only break if the monitor update didn't complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's even the purpose of the is_shutdown check below? We can't guarantee the closing negotiation will be done by the time we get there, and we'll only check once as we break out of the loop immediately after. We also already remove the channel once we finish negotiation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel::get_shutdown's docs reference it - May jump to the channel being fully shutdown (see [Self::is_shutdown]) in which case no [ChannelMonitorUpdate] will be returned).. Specifically, if the channel hasn't yet been funded then we'll jump to being fully shutdown, in which case we should just outright remove the channel. However, it looks like those are wrong! The check for whether to generate a monitor update is separate, and in face we definitely should do what the docs say! I've added another commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice. Only thing now is to ensure we don't queue a shutdown for a channel that has yet to be funded, which is a requirement in BOLT-2, but feel free to leave that for a follow-up.

@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from cc9a9ef to 0c85c30 Compare February 18, 2023 21:28
@TheBlueMatt
Copy link
Collaborator Author

Rebased and addressed feedback.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if let Some(monitor_update) = monitor_update_opt.take() {
let update_id = monitor_update.update_id;
let update_res = self.chain_monitor.update_channel(funding_txo_opt.unwrap(), monitor_update);
break handle_new_monitor_update!(self, update_res, update_id, peer_state_lock, peer_state, chan_entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice. Only thing now is to ensure we don't queue a shutdown for a channel that has yet to be funded, which is a requirement in BOLT-2, but feel free to leave that for a follow-up.

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to squash.

Comment on lines +3206 to +3217
self.pending_monitor_updates.push(monitor_update);
return Ok(self.pending_monitor_updates.last().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. I'm not a big fan of the util, to be honest. Seems the code would be much readable without it, given the amount of unnecessary parameters in some cases.

In a previous PR, we added a `MonitorUpdateCompletionAction` enum
which described actions to take after a `ChannelMonitorUpdate`
persistence completes. At the time, it was only used to execute
actions in-line, however in the next commit we'll start (correctly)
leaving the existing actions until after monitor updates complete.
The TODO mentioned in `handle_monitor_update_res` about how we
might forget about HTLCs in case of permanent monitor update
failure still applies in spite of all our changes. If a channel is
drop'd in general, monitor-pending updates may be lost if the
monitor update failed to persist.

This was always the case, and is ultimately the general form of the
the specific TODO, so we simply leave comments there
@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from 4e34ff1 to d8619ba Compare February 22, 2023 00:51
@TheBlueMatt
Copy link
Collaborator Author

Squashed the named fixups and fixed the missing doclink. Note that the Always process... commits all need to be squashed before landing, but will wait on some ACKs for that.

wpaulino
wpaulino previously approved these changes Feb 22, 2023
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do those commits need to be squashed?

jkczyz
jkczyz previously approved these changes Feb 22, 2023
We currently have two codepaths on most channel update functions -
most methods return a set of messages to send a peer iff the
`ChannelMonitorUpdate` succeeds, but if it does not we push the
messages back into the `Channel` and then pull them back out when
the `ChannelMonitorUpdate` completes and send them then. This adds
a substantial amount of complexity in very critical codepaths.

Instead, here we swap all our channel update codepaths to
immediately set the channel-update-required flag and only return a
`ChannelMonitorUpdate` to the `ChannelManager`. Internally in the
`Channel` we store a queue of `ChannelMonitorUpdate`s, which will
become critical in future work to surface pending
`ChannelMonitorUpdate`s to users at startup so they can complete.

This leaves some redundant work in `Channel` to be cleaned up
later. Specifically, we still generate the messages which we will
now ignore and regenerate later.

This commit updates the `ChannelMonitorUpdate` pipeline across all
the places we generate them.
In the previous commit, we moved all our `ChannelMonitorUpdate`
pipelines to use a new async path via the
`handle_new_monitor_update` macro. This avoids having two message
sending pathways and simply sends messages in the "monitor update
completed" flow, which is shared between sync and async monitor
updates.

Here we reuse the new macro for handling `funding_signed` messages
when doing an initial `ChannelMonitor` persistence. This provides
a similar benefit, simplifying the code a trivial amount, but
importantly allows us to fully remove the original
`handle_monitor_update_res` macro.
Building on the previous commits, this finishes our transition to
doing all message-sending in the monitor update completion
pipeline, unifying our immediate- and async- `ChannelMonitor`
update and persistence flows.
The `Channel::get_shutdown` docs are very clear - if the channel
jumps to `Shutdown` as a result of not being funded when we go to
initiate shutdown we should not generate a `ChannelMonitorUpdate`
as there's no need to bother with the shutdown script - we're
force-closing anyway.

However, this wasn't actually implemented, potentially causing a
spurious monitor update for no reason.
@TheBlueMatt TheBlueMatt dismissed stale reviews from jkczyz and wpaulino via 2adb8ee February 22, 2023 17:34
@TheBlueMatt TheBlueMatt force-pushed the 2022-11-monitor-updates-always-async branch from d8619ba to 2adb8ee Compare February 22, 2023 17:34
@TheBlueMatt
Copy link
Collaborator Author

Sadly some tests pass in the intermediary commits because the new code assumes that all ChannelMonitorUpdates are in the new Channel monitor update queue, and in the intermediary commits only some of them are. I squashed the monitor update commits with no changes, should be good to land.

@TheBlueMatt TheBlueMatt merged commit 96c8507 into lightningdevkit:main Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants